On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms

نویسندگان

  • Anthony J. Bagnall
  • Gavin C. Cawley
چکیده

We demonstrate that, for a range of state-of-the-art machine learning algorithms, the differences in generalisation performance obtained using default parameter settings and using parameters tuned via cross-validation can be similar in magnitude to the differences in performance observed between state-of-the-art and uncompetitive learning systems. This means that fair and rigorous evaluation of new learning algorithms requires performance comparison against benchmark methods with best-practice model selection procedures, rather than using default parameter settings. We investigate the sensitivity of three key machine learning algorithms (support vector machine, random forest and rotation forest) to their default parameter settings, and provide guidance on determining sensible default parameter values for implementations of these algorithms. We also conduct an experimental comparison of these three algorithms on 121 classification problems and find that, perhaps surprisingly, rotation forest is significantly more accurate on average than both random forest and a support vector machine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

Dependence of Default Probability and Recovery Rate in Structural Credit Risk Models: Empirical Evidence from Greece

The main idea of this paper is to study the dependence between the probability of default and the recovery rate on credit portfolio and to seek empirically this relationship. We examine the dependence between PD and RR by theoretical approach. For the empirically methodology, we use the bootstrapped quantile regression and the simultaneous quantile regression. These methods allow to determinate...

متن کامل

Evaluation and Prediction of the Impact of Parasite Waves and Cell Phone Use by Pregnant Mothers on the Volume of Amniotic Fluid based on Data Mining Algorithms

Introduction: Nowadays, the effects of radiation and constant use of cell phones have led to some problems. These radiations cause disorders in different systems of human body and even in a growing fetus. The aim of this study was to find the effect of using cell phone and internet by pregnant women on the amount of amniotic fluid. Method: First, a questionnaire was designed and evaluated by o...

متن کامل

Micro-classification of orchards and agricultural croplands by applying object based image analysis and fuzzy algorithms for estimating the area under cultivation

Remote sensing technology is one of the most efficient and innovative technologies for agricultural land use/cover mapping. In this regard, the object-based Image Analysis (OBIA) is known as a new method of satellite image processing which integrates spatial and spectral information for satellite image process. This approach make use of spectral, environmental, physical and geometrical characte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.06777  شماره 

صفحات  -

تاریخ انتشار 2017